{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structure Learning in Bayesian Networks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we show examples for using the Structure Learning Algorithms in pgmpy. Currently, pgmpy has implementation of 3 main algorithms:\n", "1. PC with stable and parallel variants.\n", "2. Hill-Climb Search\n", "3. Exhaustive Search\n", "\n", "For PC the following conditional independence test can be used:\n", "1. Chi-Square test (https://en.wikipedia.org/wiki/Chi-squared_test)\n", "2. Pearsonr (https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression)\n", "3. G-squared (https://en.wikipedia.org/wiki/G-test)\n", "4. Log-likelihood (https://en.wikipedia.org/wiki/G-test)\n", "5. Freeman-Tuckey (Read, Campbell B. \"Freeman—Tukey chi-squared goodness-of-fit statistics.\" Statistics & probability letters 18.4 (1993): 271-278.)\n", "6. Modified Log-likelihood\n", "7. Neymann (https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma)\n", "8. Cressie Read (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464)\n", "9. Power Divergence (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.)\n", "\n", "For Hill-Climb and Exhausitive Search the following scoring methods can be used:\n", "1. K2 Score\n", "2. BDeu Score\n", "3. Bic Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate some data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from itertools import combinations\n", "\n", "import networkx as nx\n", "from sklearn.metrics import f1_score\n", "\n", "from pgmpy.estimators import PC, HillClimbSearch, ExhaustiveSearch\n", "from pgmpy.estimators import K2Score\n", "from pgmpy.utils import get_example_model\n", "from pgmpy.sampling import BayesianModelSampling" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Generating for node: CVP: 100%|██████████| 37/37 [00:00<00:00, 544.13it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HISTORYCVPPCWPHYPOVOLEMIALVEDVOLUMELVFAILURESTROKEVOLUMEERRLOWOUTPUTHRBPHREKG...MINVOLSETVENTMACHVENTTUBEVENTLUNGVENTALVARTCO2CATECHOLHRCOBP
0TRUELOWLOWFALSELOWTRUELOWFALSEHIGHNORMAL...NORMALNORMALLOWZEROZEROHIGHHIGHHIGHLOWLOW
1FALSENORMALNORMALFALSENORMALFALSENORMALFALSEHIGHHIGH...NORMALNORMALLOWZEROLOWHIGHHIGHHIGHHIGHHIGH
2FALSENORMALNORMALFALSENORMALFALSENORMALFALSEHIGHHIGH...LOWLOWZEROZEROZEROHIGHHIGHHIGHHIGHHIGH
3FALSENORMALNORMALFALSENORMALFALSEHIGHFALSEHIGHHIGH...HIGHHIGHHIGHLOWHIGHLOWHIGHHIGHHIGHHIGH
4FALSENORMALNORMALFALSENORMALFALSENORMALFALSEHIGHHIGH...NORMALNORMALZEROHIGHLOWHIGHHIGHHIGHHIGHLOW
\n", "

5 rows × 37 columns

\n", "
" ], "text/plain": [ " HISTORY CVP PCWP HYPOVOLEMIA LVEDVOLUME LVFAILURE STROKEVOLUME \\\n", "0 TRUE LOW LOW FALSE LOW TRUE LOW \n", "1 FALSE NORMAL NORMAL FALSE NORMAL FALSE NORMAL \n", "2 FALSE NORMAL NORMAL FALSE NORMAL FALSE NORMAL \n", "3 FALSE NORMAL NORMAL FALSE NORMAL FALSE HIGH \n", "4 FALSE NORMAL NORMAL FALSE NORMAL FALSE NORMAL \n", "\n", " ERRLOWOUTPUT HRBP HREKG ... MINVOLSET VENTMACH VENTTUBE VENTLUNG \\\n", "0 FALSE HIGH NORMAL ... NORMAL NORMAL LOW ZERO \n", "1 FALSE HIGH HIGH ... NORMAL NORMAL LOW ZERO \n", "2 FALSE HIGH HIGH ... LOW LOW ZERO ZERO \n", "3 FALSE HIGH HIGH ... HIGH HIGH HIGH LOW \n", "4 FALSE HIGH HIGH ... NORMAL NORMAL ZERO HIGH \n", "\n", " VENTALV ARTCO2 CATECHOL HR CO BP \n", "0 ZERO HIGH HIGH HIGH LOW LOW \n", "1 LOW HIGH HIGH HIGH HIGH HIGH \n", "2 ZERO HIGH HIGH HIGH HIGH HIGH \n", "3 HIGH LOW HIGH HIGH HIGH HIGH \n", "4 LOW HIGH HIGH HIGH HIGH LOW \n", "\n", "[5 rows x 37 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = get_example_model(\"alarm\")\n", "samples = BayesianModelSampling(model).forward_sample(size=int(1e3))\n", "samples.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Funtion to evaluate the learned model structures.\n", "def get_f1_score(estimated_model, true_model):\n", " nodes = estimated_model.nodes()\n", " est_adj = nx.to_numpy_array(\n", " estimated_model.to_undirected(), nodelist=nodes, weight=None\n", " )\n", " true_adj = nx.to_numpy_array(\n", " true_model.to_undirected(), nodelist=nodes, weight=None\n", " )\n", "\n", " f1 = f1_score(np.ravel(true_adj), np.ravel(est_adj))\n", " print(\"F1-score for the model skeleton: \", f1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learn the model structure using PC" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Working for n conditional variables: 4: 100%|██████████| 4/4 [00:11<00:00, 2.89s/it]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "F1-score for the model skeleton: 0.7777777777777779\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "est = PC(data=samples)\n", "estimated_model = est.estimate(variant=\"stable\", max_cond_vars=4)\n", "get_f1_score(estimated_model, model)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Working for n conditional variables: 4: 100%|██████████| 4/4 [00:11<00:00, 2.88s/it]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "F1-score for the model skeleton: 0.7777777777777779\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "est = PC(data=samples)\n", "estimated_model = est.estimate(variant=\"orig\", max_cond_vars=4)\n", "get_f1_score(estimated_model, model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learn the model structure using Hill-Climb Search" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 1%| | 55/10000 [00:16<50:30, 3.28it/s] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "F1-score for the model skeleton: 0.84\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "scoring_method = K2Score(data=samples)\n", "est = HillClimbSearch(data=samples)\n", "estimated_model = est.estimate(\n", " scoring_method=scoring_method, max_indegree=4, max_iter=int(1e4)\n", ")\n", "get_f1_score(estimated_model, model)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }